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Abstract 

Few algorithms for supervised training of spiking neural networks exist that can deal with 
patterns of multiple spikes, and their computational properties are largely unexplored. We 
demonstrate in a set of simulations that the ReSuMe learning algorithm can be successfully 
applied to layered neural networks. Input and output patterns are encoded as spike trains 
O . of multiple precisely timed spikes, and the network learns to transform the input trains into 

target output trains. This is done by combining the ReSuMe learning algorithm with multi- 
plicative scaling of the connections of downstream neurons. 
^ • We show in particular that layered networks with one hidden layer can learn the ba- 

(f) ' sic logical operations, including Exclusive-Or, while networks without hidden layer cannot, 

mirroring an analogous result for layered networks of rate neurons. 
04 ' While supervised learning in spiking neural networks is not yet fit for technical purposes, 

exploring computational properties of spiking neural networks advances our understanding of 
■ how computations can be done with spike trains. 

Keywords .'Spiking Neural Networks, Supervised Learning, Logical Operation, Spike Trains 

1 Introduction 

Artificial neural networks are developed both as models of neural processing in nervous systems 
and as learning devices in artificial intelligence. Neural networks of rate neurons have found 
ample applications in industry because efficient general purpose learning algorithms exist. In our 
understanding, a general purpose algorithm can learn arbitrary mappings of input-output pattern 
pairs, - subject to general constraints of whether the input-output mapping is representable in 
the network or crosstalk between similar input patterns [l|, [l3]. Examples of such general-purpose 
algorithms for rate neurons are the family of backpropagation algorithms [25[ . However networks 
of rate neurons are not biologically plausible as they do not show spiking behaviour. 

On the other hand, spiking neural networks are biologically more plausible and serve mainly as 
models of nervous processing, but general purpose learning algorithms - in the way backpropaga- 
tion is applied to rate neurons - have not yet been found [10( . A general-purpose learning algorithm 
for spiking networks should be able to map arbitrary spatio-temporal input spike patterns to arbit- 
rary output spike patterns. So far learning in these spiking networks is largely correlation-based, 
that is variants of Hebbian learning such as Spike-Timing-Dependent Plasticity (STDP) are typ- 
ically used to change synaptic weights 

In this paper we will present a series of simulations involving layered feedforward networks of 
spiking neurons and demonstrate that these are able to learn simple computations in a supervised 
way. We will use an encoding of input and output patterns that makes use of spike trains with 
strict spike times. In comparable settings, so far only classification tasks or simple mapping tasks 
have been considered 0, [III [l4 , 21 [ , either with only a single neuron or in much larger Liquid State 
Machines (LSMs) [l5j . but no computational tasks. Or computational tasks like the Exclusive-Or 
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problem have been considered in layered networks, but only with single-spike latency-encoded 
outputs 0, 0, HH, S| • In contrast, it is our aim to demonstrate that layered networks can learn to 
perform simple, but non-trivial computations in a supervised framework and make use of multiple 
timed spikes for input and output patterns. 

In particular, as basic building blocks of computation, we demonstrate that these networks can 
learn logical operations when logical values false and true are encoded as spike trains both for 
inputs and outputs. While it can often be shown that (hand-coded) spiking neural networks can 
be Turing-equivalent HI 28|, it is instructive to demonstrate that basic building blocks of such 
computations can indeed be learnt. Already for rate neurons, theoretical Turing equivalence and 
practical learnability of a problem may not coincide 0, Q . 

A key problem in neuroscience is to understand the neural code. Usually the approach is 
"bottom-up" , that is spike trains are recorded and later analysed and checked for correlations in 
rate and timing with experimental conditions, for example sensory stimuli [1 71 ] . However many 
areas of a nervous system might be so far remote from direct sensory stimuli, that it is difficult to 
detect such correlations and to understand what precise computational function a natural network 
implements and how. 

Therefore, besides as an initial step towards general-purpose learning for spiking neural net- 
works, the present article may also be seen as a top-down complement of these neuroscientific 
approaches. Under biologically inspired constraints on information processing, we explore whether 
simple types of computation can be performed with spike-train based encoding. While bio-inspired, 
our approach however takes into account neuroscientific detail on a coarse level only. 

The article is structured as follows: In the next section [5] we discuss basic properties of two 
learning algorithms for spiking neurons and motivate our choice of ReSuMe. We present our 
learning task in section [3] In section 2] we describe the details of the simulation setup. Section [3] 
presents and discusses a series of simulations on logical operations. In section [3] we conclude by 
embedding the results into their wider context. 



2 Background 

Recently there have been interesting developments regarding supervised learning algorithms for 
spiking neural networks. Notably, there are the SpikeProp learning algorithm and ReSuMe 

m 

While SpikeProp has been applied to layered feedforward networks, each neuron is restricted 
to only one spike during a certain period. Similar restrictions also apply to extensions of this 
algorithm 0, l3(]| . SpikeProp essentially is a gradient-descent algorithm similar to backpropaga- 
tion for rate neurons. While rate neuron backpropagation uses the minimisation of Euclidean 
distance between actual and target output activation to derive weight changes to minimise error, 
in SpikeProp the Euclidean distance of actual and target spike times plays the same role. As in 
standard backpropagation, SpikeProp weight changes for synaptic connections between neurons 
are given by the (anti-)gradient of the overall network error with respect to the weight. Such 
gradient descent algorithms overcome the credit assignment problem by utilising the chain rule of 
differentiation to derive error signals for downstream neurons. 

Applications of SpikeProp and its extensions have mainly been to classification tasks in layered 
networks where the early or late timing of a single output spike indicates the class 0, 0,133]. Spike- 
Prop's application to the non linearly separable Exclusive-OR problem also follows this pattern 

Generally for SpikeProp-based algorithms, it is crucial that hidden layer neurons are initialised 
such that they spike at least once for all patterns or no error signals for that neuron and its weight 
arise. In this sense, it is difficult to come up with a good weight initialisation independent of the 
task and the pattern encoding used [jjfjj . 

Finally, Booij and tat Nguyen [1] suggest an extension of SpikeProp where multiple spikes are 
allowed in the hidden and input layers. They claim their extension is in principle also applicable 
for multiple output spikes, but - to our knowledge - it has never been successfully applied to any 
such task experimentally. Also our own preliminary simulations with SpikeProp failed for multiple 
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output spikes. Therefore simulations in the paper will be based on ReSuMe, see section 



ReSuMe is motivated by an analogue of the <5-rule and is based on STDP [21[. From a com 



bination of an STDP process between target and input spike trains of a neuron and an anti-STDP 
process between the actual output train and the input, it derives a differential STDP rule that is 
used to generate weight changes for the synaptic connections into a neuron. 

This algorithm is capable of training a single neuron to reproduce an arbitrary prescribed 
target spike train via supervised learning; however it needs a large number of incoming spikes to 
do so successfully [2l|. Unlike SpikeProp, ReSuMe is able to deal with spike patterns that involve 
many spikes. However it can only be applied to neurons that have a direct target spike train 
assigned, and the credit assignment for downstream neurons is circumvented by either training 
single neurons or a layer of output neurons on top of a large immutable LSM 15]. Experimental 
tasks include, for example, single neurons (or sets) producing a prescribed spike train from their 
incoming spikes, or classifying spike patterns when trained neurons are used as readouts for a LSM 



21 



However, much smaller networks, similar to the layered feedforward networks used for Spike- 
Prop can also perform computations and transform spike trains as will be demonstrated in this 
article. Hence ReSuMe has the potential to be a general-purpose learning algorithm for patterns 
that are based on (arbitrary) spike trains. 



3 Task Overview 

This section is an overview of the learning task, the encoding and the network structure used. For 
details, see the section|U We concentrate on simple logical operations . These are simpler than real 
world data, but it is also much clearer what type of computation has to be learnt in the network. 
We are primarily interested in these simulations as a proof of concept that computation with spike 
trains is possible, but logical operations are also at the heart of every symbolic computation, and 
it is instructive to analyse whether these basic building blocks can be learnt. 

Let Jo and J\ denote the inputs to a logical operation and Q its output. Truth values false 
and true both for input and output will be encoded as spike trains for a layered feedforward 
network [231 ] of spiking neurons (see fig. [T|). For the single output neuron Q (in slight abuse of 
notation), there are two target spikes trains SVrue and SValse, standing for the two logical output 
values. For the inputs, the network has two equally sized banks Jo and J\ of input neurons. Each 
bank plays the role of one logical input to the network. For each bank Ji the list of given spike 
trains for the bank's individual neurons (S)j it true or (S)j^ false denote collectively the logical value 
input to this bank. For details of spike train choice, see section 14.61 

We have chosen to train the four operations TRUE, JO, AND and XOR. As spike trains 
are assigned randomly to their interpretations of FALSE and true and to their bank, but have 
otherwise identical properties, these four cases cover all 16 possible logical operations of two binary 
variables [||. For example OR can be derived from AND if logical values of all input and target 
spike trains are inverted. TR UE and JO might seem trivial from a logic point of view, but they 
are probably not for a spiking neural network: 

TRUE is the logical operation that always has true as output, irrespective of its inputs. It tests 
whether the network can produce the same (or at least a similar) output train for dissimilar 
sets of input trains. 

JO is the logical operation that always has the same value as its Jo input. It tests whether the 

network can ignore the input from bank 3\ which effectively is just noise in this task. 
AND. The logical conjunction is linearly separable, can therefore be learned in a single layer 

preceptron, and is viewed as a simple computation to learn. 
XOR. The Exclusive-Or operation is not linearly separable and is therefore considered more 

difficult to learn than AND. It cannot be learnt with a simple preceptron and has frequently 

been used to demonstrate the power of a learning algorithm |16| 
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Figure 1: Network structure and encoding of logical values, (a) A three-layer network with input, 
hidden and output layer. The input layer consists of two banks J and J\. All neurons in a bank 
collectively serve as one logical input. The firing pattern of neuron Q represents the output value, 
ff- stands for the feedforward connections between layers, (b) Logical operations with spike trains. 
A logical value false or true is encoded in a set of spike trains, and these are applied to the 
input banks Jo and Ji. The network learns to perform a logical operation (such as XOR) and 
produces an output spike train which then needs to be interpreted as a logical value. 
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4 Methods 

In this section we describe and motivate our experimental setup in more detail. 

ReSuMe is a supervised learning algorithm for single spiking neurons usually driven by a large 
number of input spike trains [2l[ . ReSuMe is not fixed to a particular neuron type, but - as STDP 
- implicitly assumes, at least on longer time scales, that recent inputs have more influence on 
the current activation of a neuron than past inputs. We present the general network structure, 
fix a convenient notation and introduce ReSuMe in a form suitable for easy implementation with 



discrete time steps as opposed to the integral formulation in continuous time in [21|. We also 
address the problem of weight initialisation for downstream neurons. 



4.1 Networks 

We consider layered feedforward networks 23]. Although they are only feedforward and not 
recurrent, they have a temporal dimension since they use spiking neurons and spike train dynamics 
play out in time. Our simulations used networks with two (input, output) and three layers (input, 



hidden and output). The two-layered networks are similar to the ones used for ReSuMe 21] and 
the three layered ones are similar to the ones used with SpikeProp 

Neurons within layers do not connect to each other, but fully connect to the subsequent layer: 
there are multiple connections wxy.s for all delays s = 1ms ■ ■ • 10ms from any neuron X in the 
present layer to any neuron Y in the subsequent layer. The output layer consists of just a single 
neuron, and the input layer consists of "dummy" neurons (without any dynamics) which simply 
serve to feed input spike trains into the next layer. 

The connections between the input and output layer for the two-layered networks are subject 
to ReSuMe learning (eq. [T] and eq. [2]) . In the three-layer network only the connections from the 
hidden layer to the output layer are subject to ReSuMe learning, while connections from the input 
layer to the hidden layer are subject to rate adjustment according to eq. HJ further down. 

ReSuMe needs a large number (hundreds) of incoming connections to function [2lj], we (like 
in SpikeProp) instead use fewer inputs, but multiply incoming connections by having 10 weights 
with different delays between any two neurons, so effectively we also achieve a high number of 
incoming spikes to any neuron. 



4.2 Notation 

Let Sx denote the output spike train from neuron X. We understand the spike train as the 
ordered set of spike times U of X, ie Sx = (U). A neuron X undergoing supervised learning will 
have two output spike trains associated with it, namely the train of actual spikes — (t\ a ^), 
that is the list of times t \ a ' when it did actually spike, and the train — (A^) OI desired spike 
times t\ when we want it to spike. Let wxy denote the weight of the synaptic connection from 
presynaptic neuron X to postsynaptic neuron Y. We distinguish multiple synaptic connections 
between the same neurons X and Y with different delays s by an additional index, that is wxy.s- 
If clear from the context which particular quantity we refer to or if the argumentation is generic, 
we will leave out indices on weights wxy, times U and on other quantities introduced later. 



4.3 Weight Changes in ReSuMe 

ReSuMe considers a single neuron Y that is driven by a number of incoming spike trains, either as 
direct input spike trains or trains from other neurons. It introduces a differential STDP process 
involving the desired and actual output spike trains Sy^ and Sy ^ and all input spike trains. We 
refer the reader to details of its derivation in [2l| and present a formulation of ReSuMe that is 
broken down to the effects of individual input-output spike pairs. 

More precisely, for each connection wxy there is an STDP process between the corresponding 
input train Sx and the desired output train Sy ■ This process is complemented by an anti-STDP 
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process between the same incoming train Sx and the actual output train Sy . These processes 
can be formulated quite generally with a number of parameters, however, we restrict ourselves here 
to a formulation with a reduced set of parameters where contributions of the two STDP processes 
are of equal magnitude. 

The total weight change Awxy resulting from the STDP processes between the trains is 
the sum of all contributions of individual input-output spike pairs (tx , ty ) e Sx X Sy" 1 and 
{tx/y) S { y ] as follows: 



Aw y X y{t x ,ty') = a d + < w Y (1) 




with constants a^, Adi, Aid, t > 0. Similarly, the anti-STDP process between an input train and 
the actual output train effects weight changes as 



a (a) /, ,(a)x I —Si-die , ly — » , ,, , 

Aw XY {t x ,t Y ') = -a d + <( t(a) . (2) 



(a) 



With constants chosen the same as in eq. [TJ Aw^ a \tx,ty) — —Aw^ d \tx,ty) so that the two 
processes are balanced. If the desired and actual spike times coincide, there is no further weight 
change resulting from such a pair. The total weight change of wxy from spike trains Sx,Sy 

and Sy ■* is the sum of all above contributions from all pairings of (desired and actual) output and 
input spikes: 



Aw XY = £ £ Aw XY (tx ) tW)+ £ Aw%y(t x ,t^)\ (3) 
t x es x \tw eS w t(») e s<, a) 

If connections wxy,s have delays s then in the above formulas tx + s replaces tx- 

Learning parameters used were a d — 0, Adi = 0.0005, Aid = Adi and r = 4ms in all cases. 
Preliminary simulations had shown that for these values we could expect a reasonable convergence 
of networks with three layers and that higher rates led to no stable convergence. 

Note that in practice often Ay = so that eq. Q] and eq. [2] only yield a non-zero contribution 
for those presynaptic spikes tx that arrive before the current desired or actual spike ty considered 
[20| . Preliminary simulations in our setting showed that Aid = did not work well, presumably 
because Aid > makes most difference for a connection wxy when for the incoming spike tx 
either ty < tx < ty or ty ' > t x > ty^ , because in these cases eq. Q] and [2] have the same sign. 



4.4 Adjusting Spike Rates for Downstream Neurons 

We suggest a general natural method to overcome the problem of silent downstream neurons, that 
can hamper learning of upstream neurons. 

In SpikeProp many problems arise because neurons in the hidden layer do not fire, and it is not 
straight forward how to overcome this, other than by careful selection of initial weights, so that 
all neurons fire at least one spike for all input patterns to the network. Thus weight initialisation 
depends on the task 26]. Our layered network has in principle the same problem. Although 
connections to the neurons in the hidden layer are not actively trained in the ReSuMe sense, firing 
of hidden neurons needs to be tuned to produce a sufficient number of incoming spikes for the 
output layer. 

Natural neurons can multiplicatively scale incoming synapses collectively to keep their output 
firing rate within an acceptable range [27|. This natural scaling is adopted into our network: If we 
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set a target spike rate range [r m i n , r max ] for a neuron, weights are scaled when neuron Y's average 
rate is outside this range: 

[(1 + J)W X Y, WXY > . . 

WXY ~+ < l (4) 
(^Y^jWxy Wjfy < 

with / > for rv < fmm and / < for ry > r max . If the hidden neurons act as preprocessors 
of the input for the output neurons, it makes sense to hold their rates roughly between those of 
the input spike trains and the desired output spike train (see below). We set r m i n — 0.3/ms and 
' max — 0.1/ms with / = ±0.05. 



4.5 Neurons and Synapses 

All neurons in the network, except the input neurons, are standard Leaky-Integrate-and-Fire (LIF) 
neurons Q: 

- = --(y-v r ) + -i (5) 

where V is the current membrane potential, V r — — 60mV the resting potential. C is the membrane 
capacity, and with R the membrane resistance, r := RC is its time constant. Finally I is the input 
current. If V exceeds Vg = — 55mV, the neuron fires a spike and the membrane potential is reset 
to V = — 65mV. For simplicity, we do not enforce an absolute refactory period. Neurons are 
pulse-coupled through synapses with a numeric weight wxy,s and a delay s, that is if neuron X 
reaches the firing threshold Ve at tx, Y gets a contribution wxy.s to its input current / at tx + s. 

We simulate the neuron with a time resolution of At = 1ms, and choose R = 10MJ7, C = InF, 
hence r = 10ms. If we measure V in [mV], w in [nA] and times in [ms], then an incoming spike with 
w = InA with duration 1ms (according to the time resolution) increases the membrane voltage 
instantaneously by Ay = wAt/C = InA ms/nF = lmV. Hence with this choice of dimensions, 
the numeric value of a weight corresponds to the numeric value of the instantaneous increase of 
the membrane voltage. We will therefore leave out dimensions on weights, potentials and times 
in the following. 

All weights wxy,s in the network with delays from s = 1 • • • 10 are initialised uniformly from 
range —0.02 to 0.08, deliberately chosen small so that no output spikes are produced until the 
ReSuMe learning or scaling eq. 2] have increased the weights. The distribution is skewed towards 
positive values to coarsely reflect distribution of excitatory and inhibitory neurons in the brain, if 
not in the type of neuron, at least in the type of connection [lij . Weight are subject to changes 
according to eq. [TJ eq. [3]or eq. 21 however are clipped to values within range [—2, 2] so that several 
spikes need to contribute to a neuron's firing. Weights can change seamlessly from excitatory 
(positive values) to inhibitory (negative values) and vice- versa. 



4.6 Spike train 

We create spike trains for inputs and outputs that stand for logical values false and true. For 
rate neuron networks it is known that they frequently fail to discriminate between input patterns 
that are too similar. Preliminary simulations in our spike train setting showed that this is also 
the case here. In addition, actual output spike trains tend to have additional or missing spikes 
compared to desired spike trains. Therefore to ensure a good degree of dissimilarity between spike 
trains for the different logical values we proceed as follows: 

For each input or output neuron, first a single spike train So is created with constant spike 
probability per time slot r = 0.2/ms for input trains and 0.06/ms for output trains, both with a 
minimum Inter-Spike Interval (ISI) of 10ms (mimicking a refactory period). From train Sq two 
new trains 5 TRUE and S FAhSE are created by randomly distributing all spikes from So over S TRVE 
and S FALSE . This ensures that for a spike t £ SVrue there are no spikes in 5 FALSE in the interval 
[t — 10, t + 10] and vice- versa 
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Spike train pairs (S TRVE , S FAhSE ) of duration 100ms for logical zeros and ones are so created 
independently for all input neurons without any further constraints. Truth value patterns for an 
input bank are just the set of the respective trains for the bank's individual input neurons. For 
the output neuron, spike train pairs of 100ms duration are created in the same way, but only those 
selected that have no spikes within the first 20ms and so that each train S^rue an d Sfalse has 3 
spikes. 

4.7 Epochs and Weight Updates 

One epoch consists of ten input-target pattern pair presentations. For each such presentation, we 
choose randomly logical values for the two input banks Jo and Ji, and apply the corresponding 
sets of spike trains to the input neurons. The network runs for the simulated 100ms duration 
of the input trains plus 20ms (two times the maximal synaptic delay). The output spike train 
is recorded and, after each presentation, weight changes for all connections between hidden and 
output layer are calculated (but not applied) with eq. Q] and eq. [2] The network is then reset (all 
neurons set to V r = 60mV), and the next input-target pair selected. 

At the end of an epoch, that is, after each 10 presentations of input-target pairs, the accumu- 
lated weight changes are applied to the weights between the hidden and output layer. Also, the 
average rate of the hidden neurons over this epoch is checked and weights between the input and 
the hidden layer scaled with eq. [4] if necessary. Finally, in each epoch, we test the network on all 
pattern pairs (four for a logical operation), record the results and calculate two error measures. 

4.8 Error Measures 

Unlike gradient-descent algorithms that start from an explicit error measure between actual and 
desired output, for ReSuMe there is no such natural choice since it starts from a pair of STDP 
processes. Although ReSuMe is motivated with the <5-rule, this does not provide an immediate 
error measure since pairing of actual and desired output spikes is not obvious or even possible. 
Errors in the simulations are therefore measured as follows: 

1. Spike Train Error (STE): Our primary error measure for the difference between actual and 
desired spike trains is from van Rossum |24| . It accounts for additional and missing spikes as 
well as a close match of spike times. Given a spike train S as an ordered set of spike times, 
we can easily view it as function in time: 

$(t) = x;*(*-o (6) 

t'es 

S is convolved with f(t) = e~ t / Tc H(t) (H is the Heaviside function) where the discrete 
convolution runs over the length of 120ms: 

(/*£)(*)= f(s)S(t-s),0<t<120, (7) 

Vs:0<t-s<120 

t c = 10ms is in the order of the ISI of input and target trains. The distance R between two 
spike trains S, T is the squared distance between their convolutions 

R(S,T):= J2 (if*S)(t)-(f*f)(t)Y (8) 

0<t<120 

Finally, the STE is the sum of the distances R between the actual output and the target 
train for all four test cases. 

2. The Logic Error (LE) is the count of wrong outputs: We count an output train as 
correct if it is closer to the spike train S q of the target logical value q than to S^ q , that is if 
R(S( a \ S q d) ) < R(S( a \si d q y ). LE is the number of output trains in the four test cases that 
are not correct with this criterion, so it ranges from to 4. 
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Figure 2: Average STE and LE for logical operations XOR and AND for the three-layer networks 
with 20 hidden neurons and 12 inputs. The STE graphs are clipped at 10 as this error is only 
exceeded for the first 20 epochs as discussed in the main text. 



STE is used to generally measure how closely an actual output spike train matches its target spike 
train, while LE is our criterion to decide whether we accept an actual output train as the correct 
response of the network, namely when it is closer to the target train than to the non-target train. 



5 Simulations 

For each of the four logical operations XOR, AND, JO and TRUE, we trained three-layer networks 
and two-layer networks with the following configuration: 

1. three-layer networks with 2x6 inputs and 20 hidden neurons. 

2. two-layer networks with 2x10 input neurons. 

For each configuration and logical operation, 100 networks were run for 2000 epochs, and each run 
had a different random weight initialisation. Each run had also its individual random set of spike 
trains for input banks and outputs as described in section 14.61 

Our main interest is certainly in networks with three layers that are trained on XOR. The 
other logical operations and network configurations serve as control cases. 



5.1 Discussion 

Figures [5H3] present average learning curves for STE and LE for the two network configurations 
for logical operations XOR and AND averaged over 100 runs. Learning curves for JO and TRUE 
are very similar to AND for each network and are therefore not shown. 
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Figure 3: Average STE and LE for XOR and AND for the two-layer network with 20 input neurons 
and without hidden layer. The STE graphs are clipped at 10 as this error is only exceeded for the 
first 20 epochs as discussed in the main text. 
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Table 1: Mean STE and LE averaged over epochs 900-999 and 1900-1999 and 100 networks for 
each condition. Most significant digit of error of the mean shown in (). 



(a) 12 inputs, 20 hidden neurons (b) 20 inputs, no hidden layer 



Operation 


Error 


900-999 


1900-1999 


Operation 


Error 


900-999 


1900-1999 


AND 


STE 
LE 


3.37(2) 
0.170(3) 


2.35(3) 
0.076(2) 


AND 


STE 
LE 


1.98(1) 
0.104(2) 


0.41(1) 
0.022(1) 


JO 


STE 
LE 


3.84(2) 
0.230(4) 


2.83(3) 
0.149(3) 


JO 


STE 
LE 


1.29(1) 
0.047(2) 


0.37(1) 
0.007(1) 


TRUE 


STE 
LE 


3.32(2) 
0.161(3) 


2.55(3) 
0.078(3) 


TRUE 


STE 
LE 


0.570(9) 
0.010(1) 


0.084(2) 



XOR 


STE 
LE 


3.55(2) 
0.200(4) 


3.08(3) 
0.157(2) 


XOR 


STE 
LE 


6.39(1) 
2.012(8) 


5.70(1) 
1.994(7) 



(c) Control case, 12 inputs, 12 hidden neurons (d) Control Case: 12 inputs, no hidden layer 
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The course for STE curves is similar for all cases: From about epoch 20 the STE errors are 
below 10 and decay first relatively steeply and then slower. However the STE for the three-layer 
networks starts with a low error value in the first epochs which rises steeply to very high values up 
to 400 and then rapidly decays to values below 10 in the first 20 epochs (not shown, STE graphs 
clipped at 10). The reason for the marked peak in three- layer networks only is that initial weights 
were chosen so that no output spikes are generated at all. As weights increase by scaling eq. [4] and 
as more output spikes are generated, the errors increase until most of them are removed again via 
eq. [U For LE the picture is similar, however testing the interpretation of the output, LE starts 
always at the random level 2, perhaps slightly increasing beyond that around epoch 20 and then 
decreases to lower values (with the exception of XOR for the two- layer networks, see below). All 
in all, errors decrease in a way similar to many other supervised network learning algorithms. 

The XOR problem in the three-layer network reaches level after about 1500 epochs and does 
not decay after that, see fig.[2ja), while the STE learning curve for XOR in the two-layer networks 
is still decreasing after 2000 epochs, see fig. EJa). Additional simulations for up to 10000 epochs 
however confirmed that the STE reaches a minimum after about 3000 epochs for the two-layer 
networks, and that LE does not change at all, but fluctuates around the random performance 
value 2 throughout all epochs. 

It may be seen that all curves are very rough in nature and this is discussed below. Tables QJa) 
and (b) summarise the simulation results in terms of STE and LE averaged over all 100 networks 
for a given configuration and the 100 epochs from 900-999 and 1900-1999 respectively. 

The amount of activation that the output neuron can get from its predecessor layer is similar 
for the three (with 20 neurons in the hidden layer) and the two-layer networks (with 20 neurons 
in the input layer) and a comparison between them is interesting. We discuss the different logical 
operations at this point. The two-layer network is best except for the XOR operation, and its LE 
performance on XOR stays throughout training at a random level. In other words, the networks 
without hidden layer are not able to learn the XOR problem reliably. 

The XOR operation reaches an LE value of between 0.157 and 0.200 for the three-layer net- 
works. If we assume that networks with LE ^ have at most LE = 1, then this indicates that 
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more than a fraction of 0.843 = 1 — 0.157 of these networks produces no logical error on average for 
any epoch from 1900-1999. In other words, the three-layer networks can learn the XOR problem, 
although not with a reliability fit for technical purposes. 

As to the other logical operations, TRUE and JO do best in two layer networks. AND and 
XOR are the operations where information from the two inputs Jo and J± needs to be combined, 
and they are learned with a somewhat higher STE and LE error rate. For the three-layer network, 
JO is hardest to learn. With a large hidden layer that mixes spikes from both input banks it might 
be more difficult to find enough spikes that convey information about only one of the inputs banks. 

Overall relative differences between logical operations are lower for the three-layer networks. 
Hence the difficulties to match a spike train outweigh the difficulty to perform the computation. 

5.2 Error Roughness 

Despite averaging over 100 networks, it is obvious that the learning curves are not as smooth as for 
rate neuron learning algorithms. This roughness stems from that of individual learning curves, see 
fig. 2] for a typical successful network trained on the XOR task. Individual networks occasionally 
lose a good solution and refind it later. This can lead to large changes in STE. However changes 
in LE might not be as pronounced as in STE, see fig. QJc). These abrupt changes are an effect 
of the discontinuous nature of spike events, where relatively large discrete jumps in the errors for 
individual networks are expected when weights are slightly changed but an additional output spike 
is created or disappears. 

5.3 Further Control Cases 

We also ran two further network configurations as control cases, namely a three-layer network still 
with 12 inputs, but only 12 neurons in the hidden layer, and a two-layer network with only 12 
inputs. Learning curves were qualitatively similar to the other networks, but performance (see 
table [He) and (d)) was worse than their counterparts with bigger hidden or input layer. 

That the three- layer network with 20 hidden neurons and 12 inputs performs better on all 
logical operation than the two-layer network with 12 inputs, demonstrates that, as for rate neuron 
networks, a hidden layer is useful to preprocess and mix inputs even though the total information 
fed into the networks is the same. 



6 Conclusion 

The present simulations - to our knowledge for the first time - present an example of supervised 
learning in layered spiking neural networks where inputs and outputs are encoded as spike trains 
of multiple spikes. It extends and builds on other supervised learning algorithms for spiking 
neural networks like ReSuMe and SpikeProp. Restrictions on spike patterns in SpikeProp and 
its extensions (one latency-coded output spike) are more severe than in the present simulations 
(three timed output spikes). ReSuMe has only been used on either single neurons or on read-outs 
of LSM, nor has it been attempted to implement a simple but non-trivial computation like the 
XOR operation. SpikeProp suffers from silent neurons in the hidden layer for which no error 
signals can be obtained. We sidestep a similar problem by scaling weights multiplicatively so that 
firing rates are kept within a specified range. 

Our results indicate that on average more than 80% of the three-layer networks in any one of 
the final 100 epochs compute the not linearly separable XOR operation correctly while two- layer 
networks do not. This extends a similar observation for layered networks of rate neurons [l6[. 
However the roughness of the learning curves suggests that networks frequently lose and refind a 
good solution. 

ReSuMe as applied to layered networks is certainly not reliable enough for technical purposes 
or even for information processing in the nervous system. However we have so far only considered 
a single output spike train. If a single neuron and its spike train are individually not reliable, 
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(a) STE for all epochs, clipped at 20. (b) STE. Enlargement of (a) for epochs 750-1500. 




(c) LE for all epochs. 
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(d) LE. Enlargement of (c) for epochs 750-1500. 



(e) Spike Trains for epochs 750-1500. 

Figure 4: Typical individual learning curves and spike train evolution for a single network with 
12 inputs and 20 hidden neurons trained on XOR. (a) /(c) The network is generally performing 
well from about 750 epochs, however it loses a solution that is close to the target spike trains for 
short times, indicated by the peaks of STE. However LE is less effected. (b)/(d) Enlargement 
of (a)/(c) for epochs 750-1500. Vertical dotted lines indicate the peaks of STE. LE deviates 
from immediately before the STE peaks and then returns to 0. Towards the end of the cascade 
of peaks the networks settles into a closer approximation of the target spike trains with STE 
frequently at about 1.5. (e) Spike times evolution for epochs 750-1500: The actual output spike 
times are represented as rings. Desired spike times are represented as the solid horizontal lines. 
These are covered heavily with actual spikes that hit the right spike time. This graph overlays 
actual and desired output spike times for all XOR input-output patterns 000,011,101,110. That 
all desired spikes time seem to be hit, does not imply that any single output train hits all target 
times. Vertical dotted lines again indicate the epochs with STE peaks. These are correlated with 
a reorganisation of spike time patterns. 
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they may be as an ensemble. It has been observed that neurons driven with the same inputs can 
be trained to produce different spike trains [ll[. It is therefore possible that a bank of output 
neurons driven from the same hidden layer, but producing different spike trains for the same logical 
value, can be trained successfully, as incoming weights of the output neurons and their targets are 
independent. This ensemble would be more reliable to represent the true or false output than 
any neuron on its own [191 ]. Multiple output spike trains mirror using banks of inputs, too. In 
addition, this is also more realistic as in nature neurons act collectively to encode information and 
omission or addition of single spikes does not seem to be critical 0, @| . 

While it is often clear that a given network structure is Turing-equivalent, it is less clear what 
computations can be learnt and how they are implemented on a given network in a natural way. 
There has been a successful stream of research to analyse what computational representations a 
rate network evolves for a given computational problem @, In this spirit, we believe it is 

now time to explore spiking neural networks and their computational capabilities. 
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